[MXNET-779]Add DLPack Transformation API #12047

wkcn · 2018-08-06T11:48:58Z

Description

Add DLPack Transformation API

DLPack is important for sharing tensors and operators among Deep Learning Frameworks.
Issue: #11964

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Add MXNDArrayToDLPack, MXNDArrayFromDLPack, MXNDArrayCallDLPackDeleter and MXNDArrayCallDLPackCapsuleDeleter in c_api
Add ToDLPack() and FromDLPack for the class NDArray
Add the constructor TBlob(const DLTensor &dltensor)
Add the function DLDataTypeTransform in tensor_blob.h for the transformation from DLDataType to MShadow Type Flag
Add NDArray(const TBlob &data, int dev_id, const std::function<void()>& deleter) function
Add mx.nd.NDArray.to_dlpack_for_read, mx.nd.NDArray.to_dlpack_for_write, mx.nd.to_dlpack_to_read, mx.nd.to_dlpack_for_write and mx.nd.from_dlpack
Add unit-test for DLPack Transformation API

Comments

Extra Test with PyTorch

…to DLPack-convertor-API

tqchen

This is good change, I will make a few additional comments below

tqchen · 2018-08-06T16:30:27Z

python/mxnet/_ctypes/ndarray.py

@@ -31,10 +31,10 @@

 class NDArrayBase(object):
    """Base data structure for ndarray"""
-    __slots__ = ["handle", "writable"]
+    __slots__ = ["handle", "writable", "dlpack_handle"]


do not put dlpack_handle inside it, instead directly return a pycapsule that contains the handle

tqchen · 2018-08-06T16:30:52Z

python/mxnet/cython/ndarray.pyx


    def __dealloc__(self):
        CALL(MXNDArrayFree(self.chandle))
+        if self.cdlpack_handle:
+            CALL(MXNDArrayCallDLPackDeleter(self.cdlpack_handle))


do not put dlpack_handle inside the object

tqchen · 2018-08-06T16:31:54Z

src/ndarray/ndarray.cc

+struct NDArrayDLManager {
+    NDArray handle;  // ref NDArray
+    DLManagedTensor tensor;
+    TShape strides;  // store variable strides


DLPack allow strides field to be nullptr, for mxnet do do not need to store strides, as we only support compact tensor for now.

During conversion to mxnet tensor, check the strides field and see if it is compact

I will fix it.
It seems that PyTorch.utils.from_dlpack doesn't allow strides nullptr.

tqchen · 2018-08-06T16:32:08Z

src/ndarray/ndarray.cc

+  if (!is_none()) {
+    dlmanager->tensor.dl_tensor = data().dltensor();
+    // assign value for dl_tensor.strides
+    if (!dlmanager->tensor.dl_tensor.strides) {


strides == nullptr

tqchen · 2018-08-06T16:40:07Z

see my additional comments in #11964

wkcn · 2018-08-07T00:05:36Z

@tqchen Thank you!
I have changed it as you suggest.

…in cython/base.pyi

tqchen · 2018-08-08T16:59:41Z

include/mxnet/c_api.h

+* \param out_dlpack pointer holder to get pointer of DLManagedTensor
+* \return 0 when success, -1 when failure happens
+*/
+MXNET_DLL int MXNDArrayToDLPackForRead(NDArrayHandle handle,


From API point of view, we can just expose ToDLPack, and in the python API, explicitly call wait_for_read and wait_for_write

I will modify it.

tqchen · 2018-08-08T17:00:22Z

include/mxnet/c_api.h

+ * \param dlpack_capsule the pointer of a PyCapsule storing DLManagedTensor
+ * \return 0 when success, -1 when failure happens
+ */
+MXNET_DLL void MXNDArrayCallDLPackCapsuleDeleter(PyObjectHandle dlpack_capsule);


I am less certain why we need the deleter function here, can they be directly handled in the python/cython side?

I tried to implement a deleter function in python, however the deleter function may be released by Python GC before calling the deleter function. See the test Code. It will raise segmentation fault.
The Python Frontend of MXNet both uses Python(ctypes) and Cython. It may be impossible to implement the deleter function in ctypes.
So the deleter function should be implemented in C++.

Take a look at destructor at apache/tvm#1573

There is some subtlty here but they can never-the-less be implemented

I ever tried to write a python function as the destructor, but it can't pass CI.
Please see the MXNet CI result
All windows test_dlpack failed because the destructor written in Python is released before calling it.

PyTorch implemented the destructor using Python API in C++, and CuPy implemented it by cython, namely the code will be built by C++.
However, MXNet uses ctypes and cython. I couldn't find a better way to implement the destructor except writing it in MXNet C++ API.

Yes. I knew the trick and tried it in my previous PR. But it failed in Windows Test.
Related CI

It seems that the CI of TVM doesn't have Windows Test so the CI is passed.
The reason is that the destructor will be released by Python GC before calling it.
And the GC release order are different between Linux and Windows.

In Linux, the destructor is called first, then the destructor is released. So it works.
However, In Windows, the destructor is released first before calling it, it doesn't work.

This is strange as destructor itself sits in the global scope and should be destructed after the dltensors(which have a local scope)

Yes.
In the test code, it works in Linux but failed in Windows.

I see two problems in your particular gist you paste.

The destructor need to be declared in the global scope(instead of constructing when passing to the argument)

THe cstring need to outlive the capsule(construct a global string)

The function need to outlive the capsule(constructor c func and put it under global scope/module)

cfunc = ctypes.CFUNCTYPE(None, ctypes.c_void_p) def dfunc(dltensor): pycaps = ctypes.cast(dltensor, ctypes.py_object) pass c_destructor = cfunc(dfunc) c_str_dltensor = ctypes.c_char_p(b"dltensor") def test(): a = ctypes.pythonapi.PyCapsule_New(1, c_str_dltensor, c_destructor) test()

Thank you!
I found it works on Windows and Linux.
I have updated the PR.

tqchen · 2018-08-08T17:01:03Z

include/mxnet/tensor_blob.h

+   * \brief constructor that construct TBlob from DLTensor
+   * \param DLTensor Object
+   */
+  explicit TBlob(const DLTensor &dltensor) : dptr_(dltensor.data),


Need to add compactness check

Specifically, TBlob only support compact tensors, need to check strides == null or the strides reflect a compact setting

Thanks. I will move the strides check from ndarray.cpp to tensor_blob.h.

tqchen · 2018-08-08T17:02:01Z

python/mxnet/_ctypes/ndarray.py

    # pylint: disable= no-member

-    def __init__(self, handle, writable=True):
+    def __init__(self, handle, writable=True, dlpack=None):


dlpack should not be part of the member, the PyCapsule manages itself

The dlpack in NDArray is PyCapsule which is the return value of to_dlpack_for_read/write.
When calling FromDLPack function, NDArray need to manage the release of NDArrayDLManager::handle in ndarray.cc.
e.g.

a = mx.nd.array([1,2,3]) # Denote TBlob x to store the data pack = a.to_dlpack_for_write() b = mx.nd.from_dlpack(pack) del a, pack # a and PyCapsule pack has been released. # b need to manage the release TBlob x.

NDArray doesn't have the deleter function, so I made dlpack as a member of NDArray.
When the reference count of dlpack in NDArray is zero, the TBlob will be released.
Is there any other way to keep the reference?

A better way is to keep NDArray's shared_ptr inside the manager_ctx itself, you can take a look at TVM's NDArray to DLManagedTesnor impl

NDArray in MXNet and TVM are different. NDArray in TVM has the function IncRef and DecRef to change the reference count, however that in MXNet uses NDArray::ptr_ (std::shared_ptr) to manage the reference count. NDArray::ptr_ is a private member of NDArray.
The PR is similar to the PyTorch DLPack implementation Code
I add a NDArrayDLManager to manage the reference count of NDArray. Code line 315 in src/ndarray/ndarray.cc

Setting dlpack as the NDArray(Python Class) member is to avoid the release of the store of data, e.g. When the original NDArray and the PyCapsule (DLPack) are release, the new NDArray (generated by from_dlpack) still exists.

you can create a new NDArray() that copies the original NDArray(which increases refcount) and put that as a context

In your case, when a get deleted, b still holds a NDArrayDLManager, which is allocated by new, and that object still hold NDArray(which holds a shared_ptr), so the original resource won't be released

You need to be careful to use shape from the same NDArray in your NDArrayDLManager

Thanks.
But I think the new NDArray object b couldn't hold a shared_ptr as the same as the original NDArray a. The new NDArray b only get the data pointer rather than shared_ptr from DLPack.

In the other case,

from torch.utils import dlpack a = torch.array([1,2,3]) pack = dlpack.to_dlpack(a) b = mx.nd.from_dlpack(pack) del a, pack

When dlpack.to_dlpack is called, PyTorch will allocate ATenDLMTensor which increases the refcount of Torch Tensor code.
After the variables a and pack are released, ATenDLMTensor still exists.
I think the deleter should be called by the new NDArray b when the NDArray b releases. Refer to PyTorch FromDLPack. However, NDArray doesn't have explicit deleter parameter.

In my PR, from_dlpack will copy the dlpack object.
When the old dlpack pack releases, it doesn't call the deleter.
The new dlpack b.dlpack will be a member of the new NDArray b as NDArray(handle=handle, dlpack=dlpack_copy).
When the new NDArray b releases, the new dlpack b.dlpack will be released, then call the deleter by the new dlpack b.dlpack. And the deleter will release NDArrayDLManager or ATenDLMTensor. The refcount of the old NDArray a will decrease 1.

If you copy the NDArray, they hold the same shared_ptr to the data, note that shared_ptr can be copied, and its ref counter is automatically managed

I made a copy of NDArray as the member of NDArrayDLManager, and the copy increase the refcount.
I'm confused how to modify the PR.
After creating a new NDArray(Python) from DLPack, then delete the old NDArray(Python) and PyCapsule(DLPack).

Which object will call the deleter function?

In my case, when a gets deleted, how does b hold the NDArrayDLManager? It seems that b only get the pure data pointer from DLManagedTensor::dl_tensor::data, the type of the pointer is not shared pointer.
And how does b store the pointer to NDArrayDLManager in MXNet NDArray?
@tqchen

tqchen · 2018-08-08T17:02:38Z

Since it is a C API change @piiswrong can you also take a look?

tqchen · 2018-08-10T04:44:22Z

python/mxnet/ndarray/ndarray.py

+    check_call(_LIB.MXNDArrayWaitToWrite(data.handle))
+    dlpack = DLPackHandle()
+    check_call(_LIB.MXNDArrayToDLPack(data.handle, ctypes.byref(dlpack)))
+    return ctypes.pythonapi.PyCapsule_New(dlpack, b'dltensor', pycapsule_dlpack_deleter)


cannot directly pass in string, instead use a global string object, PyCapsule requires the string to outlive the capsule

Solved it. Thank you!

tqchen · 2018-08-10T04:46:05Z

src/c_api/c_api.cc

+
+
+typedef struct {
+    char py_object[16];


This seems to be dangerous unless pycapsule is already standardized and won't change

Yes, it's dangerous.
I want to call the function PyCapsule_GetPointer in c_api.cc,
however MXNet doesn't include Python.h header file.

I have removed it in the latest PR.

wkcn · 2018-08-22T04:19:09Z

@tqchen
Hi! What should I do next for this PR?
For the release of DLPack,
here are the implements of TVM, PyTorch and Cupy.

TVM src/runtime/ndarray.cc#L148

NDArray NDArray::FromDLPack(DLManagedTensor* tensor) {
  NDArray::Container* data = new NDArray::Container();
  data->deleter = Internal::DLPackDeleter;
  data->manager_ctx = tensor;
  data->dl_tensor = tensor->dl_tensor;
  return NDArray(data);
}

PyTorch aten/src/ATen/DLConvertor.cpp#L170

Tensor fromDLPack(const DLManagedTensor* src) {
  Backend backend = getATenBackend(src->dl_tensor.ctx);
  ScalarType stype = toScalarType(src->dl_tensor.dtype);
  auto deleter = [src](void * self) {
    src->deleter(const_cast<DLManagedTensor*>(src));
  };
  return getType(backend, stype).tensorFromBlob(
      src->dl_tensor.data,
      IntList(src->dl_tensor.shape, src->dl_tensor.ndim),
      IntList(src->dl_tensor.strides, src->dl_tensor.ndim),
      deleter);
}

CuPy cupy/core/dlpack.pyx#L231

mem = DLPackMemory(dltensor)
mem_ptr = memory.MemoryPointer(mem, mem.dlm_tensor.dl_tensor.byte_offset)
cupy_array = ndarray(shape_vec, cp_dtype, mem_ptr)

The all NDArray save the deleter of dlpack or DLManagedTensor.

So I added a new member dlpack for mx.nd.NDArray (Python) for store of dlpack.
When the NDArray release, the dlpack will call the deleter.
Dose NDArray(C++) manage the release of dltensor?

tqchen · 2018-08-22T05:28:51Z

My point is that the DLManagerd Tensor should manage its own deletion and you do not have to add dlpack field for mx.nd.NDArray. Even when the frontend NDArray release, the internal data will be kept alive until the pycapsule get released and the DLManagedTensor's deleter get called

wkcn · 2018-08-22T06:38:42Z

@tqchen Hi! I have modified the PR, the DLManaged Tensor will manage itself.
However, there is a problem in PR(DLManagedTensor manages itself).
In this case,

import mxnet as mx
pack = mx.nd.array([1,2,3]).to_dlpack_for_write()
b = mx.nd.from_dlpack(pack)
del pack # DLManagedTensor's deleter get called when `pack` get released.
print (b) # b != [1,2,3]

Does it need to manage the release of dlpack in NDArray Chunk (C++) ?

I add deleter in TBlob and NDArray::Chunk in the latest PR (add deleter for TBlob and Chunk in NDArray), and solve the problem.

tqchen · 2018-08-22T16:27:18Z

pack is only supposed to be consumed once, when from_dlpack is called, the pack itself should be marked as used and its deleter will no longer be called when the capsule get destroyed

wkcn · 2018-08-22T22:43:54Z

@tqchen Yes. The latest PR has implemented it in ndarray.py.
And the new NDArray created by from_dlpack will call the deleter.

wkcn · 2018-09-09T22:50:07Z

@tqchen @piiswrong
Hi!
Is there any problem? Could you please merge the PR?
Thank you!

tqchen · 2018-09-09T22:58:21Z

include/mxnet/tensor_blob.h

@@ -89,7 +92,8 @@ class TBlob {
  template<typename DType>
  TBlob(DType *dptr, const TShape &shape, int dev_mask, int dev_id = -1)
      : dptr_(dptr), shape_(shape),
-        type_flag_(mshadow::DataType<DType>::kFlag) {
+        type_flag_(mshadow::DataType<DType>::kFlag),
+        deleter_(nullptr) {


We cannot introduce deleter to the TBlob

tqchen · 2018-09-09T23:00:34Z

include/mxnet/ndarray.h

@@ -831,7 +855,7 @@ class NDArray {

    Chunk(const NDArrayStorageType storage_type_, const TBlob &data,
          const std::vector<TBlob> &aux_data, int dev_id)
-        : static_data(true), delay_alloc(false), storage_type(storage_type_) {
+        : static_data(true), delay_alloc(false), storage_type(storage_type_), deleter_(nullptr) {


We cannot plug deleter to the TBlob and Chunk, this seems will cause more problems than what it can solve.

std::shared_ptr do support customized deleter already, and maybe that can be used to help solve this issue of having to hide a customized deleter for NDArray

tqchen · 2018-09-09T23:01:11Z

@wkcn sorry I was away for a vacation recently, please see my comments

wkcn · 2018-09-10T02:33:25Z

@tqchen Thank you! I will try to change it.

wkcn · 2018-09-11T00:16:12Z

@tqchen Hi! I have updated the code as your comments.
I add a new constructor NDArray(const TBlob &data, int dev_id, const std::function<void()>& deleter) for NDArray class, because the performance between std::make_shared and std::shared_ptr::shared_ptr is different.

tqchen · 2018-09-21T06:16:00Z

I think it is good to go from my end. thanks @wkcn . @piiswrong please take another look, if there is no issues I will merge this in in two days

tqchen · 2018-09-21T06:17:39Z

@szha can you also take a round of review and if there is no problem, merge this in?

wkcn · 2018-09-21T12:45:46Z

@tqchen Thank you so much!

tqchen · 2018-09-22T03:33:40Z

This is now merged

wkcn added 4 commits August 3, 2018 08:03

add dlpack convertor api

822706e

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

8aac3da

…to DLPack-convertor-API

add to_dlpack and from_dlpack for NDArray

ab6fa85

fix dlpack deleter and add unittest for dlpack

8c6e9d2

wkcn requested review from anirudh2290 and szha as code owners August 6, 2018 11:48

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

9fdfa7d

…to DLPack-convertor-API

wkcn changed the title ~~Add DLPack Transformation API~~ [MXNET-779]Add DLPack Transformation API Aug 6, 2018

wkcn added 6 commits August 6, 2018 20:20

update 3rdparty

1142787

fix for cpplint

16df8d5

fix pylint and add destructor for dlpack

bfcffa2

fix pylint in base.py

f5c2552

fix lint in base.py

98b5d11

add document for DLPack transformation API

7bdde8f

tqchen reviewed Aug 6, 2018

View reviewed changes

wkcn added 4 commits August 7, 2018 14:47

add to_dlpack_for_read and to_dlpack_for_write

f225d27

fix lint for ndarray.py and fix typo in c_api.h

afc1518

fix function name error in c_api

8b397fd

update code indent in tensor_blob.h ans c_api.cc, remove unused type …

d48074a

…in cython/base.pyi

tqchen suggested changes Aug 8, 2018

View reviewed changes

nswamy added Backend Issues related to the backend of MXNet pr-awaiting-response PR is reviewed and waiting for contributor to respond labels Aug 8, 2018

use MXNDArrayToDLPack in c_api and add compactness check in TBlob

58c5d87

szha requested review from piiswrong and removed request for szha August 9, 2018 18:58

tqchen suggested changes Aug 10, 2018

View reviewed changes

merge master and fix merge conflict

72edbf8

wkcn added 2 commits August 12, 2018 08:49

rename dlpack after using it

adf36ef

merge master

26db4d0

DLManagedTensor manages itself

dec838d

wkcn added 3 commits August 22, 2018 16:28

add deleter for TBlob and Chunk in NDArray

850c3dc

remove used code in python/mxnet/base.py

fc99323

retrigger CI

ffe60c6

tqchen suggested changes Sep 9, 2018

View reviewed changes

wkcn added 4 commits September 10, 2018 11:22

add deleter for shared_ptr<Chunk>

cbb17c3

Merge branch 'master' into DLPack-convertor-API

e56be1f

compilation okay

b1204bc

fix cpplint

fe1387f

tqchen approved these changes Sep 21, 2018

View reviewed changes

tqchen added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-response PR is reviewed and waiting for contributor to respond labels Sep 21, 2018

tqchen merged commit 9f14a68 into apache:master Sep 22, 2018

tqchen mentioned this pull request Sep 22, 2018

[Feature request] There is no converter API from mx.nd.NDArray to DLPack tensor #11964

Closed

wkcn mentioned this pull request Oct 14, 2018

Is it necessary to wait CUDA stream when calling WaitToRead or WaitToWrite? #12823

Closed

[MXNET-779]Add DLPack Transformation API #12047

[MXNET-779]Add DLPack Transformation API #12047

Conversation

wkcn commented Aug 6, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Aug 6, 2018

wkcn commented Aug 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Aug 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Aug 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen Aug 11, 2018 • edited Loading

Choose a reason for hiding this comment

wkcn Aug 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Aug 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Aug 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn Aug 11, 2018 • edited Loading

Choose a reason for hiding this comment

tqchen Aug 11, 2018 • edited Loading

Choose a reason for hiding this comment

wkcn Aug 12, 2018 • edited Loading

Choose a reason for hiding this comment

tqchen commented Aug 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkcn commented Aug 22, 2018 • edited Loading

tqchen commented Aug 22, 2018

wkcn commented Aug 22, 2018 • edited Loading

tqchen commented Aug 22, 2018

wkcn commented Aug 22, 2018

wkcn commented Sep 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Sep 9, 2018

wkcn commented Sep 10, 2018

wkcn commented Sep 11, 2018 • edited Loading

tqchen commented Sep 21, 2018

tqchen commented Sep 21, 2018

wkcn commented Sep 21, 2018

tqchen commented Sep 22, 2018

wkcn commented Aug 6, 2018 •

edited

Loading

wkcn commented Aug 7, 2018 •

edited

Loading

wkcn Aug 9, 2018 •

edited

Loading

wkcn Aug 11, 2018 •

edited

Loading

tqchen Aug 11, 2018 •

edited

Loading

wkcn Aug 11, 2018 •

edited

Loading

wkcn Aug 9, 2018 •

edited

Loading

wkcn Aug 10, 2018 •

edited

Loading

wkcn Aug 11, 2018 •

edited

Loading

tqchen Aug 11, 2018 •

edited

Loading

wkcn Aug 12, 2018 •

edited

Loading

wkcn commented Aug 22, 2018 •

edited

Loading

wkcn commented Aug 22, 2018 •

edited

Loading

wkcn commented Sep 11, 2018 •

edited

Loading